import all necessary libraries

In [1]:

as data is in the form of excel, use read_excel command

In [2]:
Out[2]:
ID Age Experience Income ZIP Code Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard
0 1 25 1 49 91107 4 1.6 1 0 0 1 0 0 0
1 2 45 19 34 90089 3 1.5 1 0 0 1 0 0 0
2 3 39 15 11 94720 1 1.0 1 0 0 0 0 0 0
3 4 35 9 100 94112 1 2.7 2 0 0 0 0 0 0
4 5 35 8 45 91330 4 1.0 2 0 0 0 0 0 1

Perform Exploratory Data Analysis

In [3]:
Out[3]:
(5000, 14)
In [4]:
Out[4]:
ID                    0
Age                   0
Experience            0
Income                0
ZIP Code              0
Family                0
CCAvg                 0
Education             0
Mortgage              0
Personal Loan         0
Securities Account    0
CD Account            0
Online                0
CreditCard            0
dtype: int64
In [5]:
In [6]:
Out[6]:
Index(['Age', 'Experience', 'Income', 'Family', 'CCAvg', 'Education',
       'Mortgage', 'Personal Loan', 'Securities Account', 'CD Account',
       'Online', 'CreditCard'],
      dtype='object')
In [7]:
In [8]:
AgeExperienceIncomeFamilyEducation050100150200
variablevalue

Five point summary suggest that Experience has negative value(This should be fixed).

we can see the Min, Max, mean and std deviation for all key attributes of the dataset
Income has too much noise and slightly skewed right, Age and exp are equally distributed.

check if there is skewness in data or not!!

In [9]:
Out[9]:
Age                  -0.029341
Experience           -0.026325
Income                0.841339
Family                0.155221
CCAvg                 1.598457
Education             0.227093
Mortgage              2.104002
Personal Loan         2.743607
Securities Account    2.588268
CD Account            3.691714
Online               -0.394785
CreditCard            0.904589
dtype: float64
In [10]:
Out[10]:
Age                     int64
Experience              int64
Income                  int64
Family                  int64
CCAvg                 float64
Education               int64
Mortgage                int64
Personal Loan           int64
Securities Account      int64
CD Account              int64
Online                  int64
CreditCard              int64
dtype: object

now visualise Skewness by distribution

In [11]:
Out[11]:
array([[<AxesSubplot:title={'center':'Age'}>,
        <AxesSubplot:title={'center':'Experience'}>,
        <AxesSubplot:title={'center':'Income'}>],
       [<AxesSubplot:title={'center':'Family'}>,
        <AxesSubplot:title={'center':'CCAvg'}>,
        <AxesSubplot:title={'center':'Education'}>],
       [<AxesSubplot:title={'center':'Mortgage'}>,
        <AxesSubplot:title={'center':'Personal Loan'}>,
        <AxesSubplot:title={'center':'Securities Account'}>],
       [<AxesSubplot:title={'center':'CD Account'}>,
        <AxesSubplot:title={'center':'Online'}>,
        <AxesSubplot:title={'center':'CreditCard'}>]], dtype=object)

INFERENCE from Histogram

1.Age & Experience are to an extent equally distributed
2.Income & Credit card spending are skewed to the left
3.We have more Undergraduates than Graduate and Advanced & Professional
4.60% of customers have enabled online banking and went digital
In [12]:
In [13]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).

Out[13]:
<AxesSubplot:xlabel='Experience', ylabel='Density'>
In [14]:
Out[14]:
20.1046
In [15]:
Out[15]:
Age Experience Income Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard
89 25 -1 113 4 2.30 3 0 0 0 0 0 1
226 24 -1 39 2 1.70 2 0 0 0 0 0 0
315 24 -2 51 3 0.30 3 0 0 0 0 1 0
451 28 -2 48 2 1.75 3 89 0 0 0 1 0
524 24 -1 75 4 0.20 1 0 0 0 0 1 0
In [16]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).

Out[16]:
<AxesSubplot:xlabel='Age', ylabel='Density'>
In [17]:
Out[17]:
-1.4423076923076923
In [18]:
Out[18]:
624
In [19]:
There are 624 records which has negative values for experience, approx 1.04 %
In [20]:
In [21]:
Out[21]:
Age Experience Income Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard
0 25 1 49 4 1.6 1 0 0 1 0 0 0
1 45 19 34 3 1.5 1 0 0 1 0 0 0
2 39 15 11 1 1.0 1 0 0 0 0 0 0
3 35 9 100 1 2.7 2 0 0 0 0 0 0
4 35 8 45 4 1.0 2 0 0 0 0 0 1
In [ ]:

use numpy where function to change the negative values to mean value derived from data with the same age group

In [22]:
In [23]:
Out[23]:
Age Experience Income Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard
In [24]:
Out[24]:
<AxesSubplot:>

We could see that Age & Experience are very strongly correlated,

Hence it is fine for us to go with Age and drop Experience to avoid multi-colinearity issue.

In [25]:
In [26]:
Out[26]:
Age Income Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard
0 25 49 4 1.6 1 0 0 1 0 0 0
1 45 34 3 1.5 1 0 0 1 0 0 0
2 39 11 1 1.0 1 0 0 0 0 0 0
3 35 100 1 2.7 2 0 0 0 0 0 0
4 35 45 4 1.0 2 0 0 0 0 0 1
In [ ]:
In [27]:
Out[27]:
array([1, 2, 3], dtype=int64)
In [28]:
In [29]:
In [30]:
Out[30]:
Age Income Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard Edu_mark
0 25 49 4 1.6 1 0 0 1 0 0 0 Undergrad
1 45 34 3 1.5 1 0 0 1 0 0 0 Undergrad
2 39 11 1 1.0 1 0 0 0 0 0 0 Undergrad
3 35 100 1 2.7 2 0 0 0 0 0 0 Graduate
4 35 45 4 1.0 2 0 0 0 0 0 1 Graduate
In [31]:
In [32]:
Out[32]:
Edu_mark
Advanced/Professional    1501
Graduate                 1403
Undergrad                2096
Name: Age, dtype: int64
In [33]:
41.9%30%28.1%
UndergradAdvanced/ProfessionalGraduatePie CHart

Inference :We could see that We have more Undergraduates 41.92% than graduates(28.06%) & Advanced Professional(30.02%)

In [ ]:
In [34]:
Out[34]:
Index(['Age', 'Income', 'Family', 'CCAvg', 'Education', 'Mortgage',
       'Personal Loan', 'Securities Account', 'CD Account', 'Online',
       'CreditCard', 'Edu_mark'],
      dtype='object')

Lets Explore the account holder's distribution

In [35]:
In [36]:
In [37]:
Out[37]:
Age Income Family CCAvg Education Mortgage Personal Loan Securities Account CD Account Online CreditCard Edu_mark Account_holder_category
0 25 49 4 1.6 1 0 0 1 0 0 0 Undergrad Holds only Securites
1 45 34 3 1.5 1 0 0 1 0 0 0 Undergrad Holds only Securites
2 39 11 1 1.0 1 0 0 0 0 0 0 Undergrad Does not Holds Securites or Deposit
3 35 100 1 2.7 2 0 0 0 0 0 0 Graduate Does not Holds Securites or Deposit
4 35 45 4 1.0 2 0 0 0 0 0 1 Graduate Does not Holds Securites or Deposit
In [38]:
Out[38]:
Index(['Does not Holds Securites or Deposit', ' Holds only Securites ',
       ' Holds only Deposit', 'Holds Securites & Deposit'],
      dtype='object')
In [ ]:
In [39]:
86.5%7.5%3.1%2.94%
Does not Holds Securites or Deposit Holds only Securites Holds only DepositHolds Securites & DepositPie CHart

We could see that alomst 87% of customers do not hold any securities or deposit, and 3 % hold both securities as well as deposit. It will be good if we encourage those 87% to open any of these account as it will improve the assests of the bank

In [ ]:
In [40]:
Out[40]:
Index(['Age', 'Income', 'Family', 'CCAvg', 'Education', 'Mortgage',
       'Personal Loan', 'Securities Account', 'CD Account', 'Online',
       'CreditCard', 'Edu_mark', 'Account_holder_category'],
      dtype='object')
In [41]:
123050100150200123
EducationEducationIncomePersonal Loan=0Personal Loan=1

Inference:From the above plot we could say that Income of customers who availed personal loan are alomst same irrescpective of their Education

In [ ]:
In [42]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

Out[42]:
<matplotlib.legend.Legend at 0x1c841ec3f10>

Conclusion: Customers Who have availed personal loan seem to have higher income than those who do not have personal loan

In [ ]:

automate above stuffs

In [43]:
In [44]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

In [ ]:
In [45]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

In [46]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\distributions.py:2557: FutureWarning:

`distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots).

People with high mortgage value, i.e more than 400K have availed personal Loan

In [ ]:
In [47]:
Out[47]:
Index(['Age', 'Income', 'Family', 'CCAvg', 'Education', 'Mortgage',
       'Personal Loan', 'Securities Account', 'CD Account', 'Online',
       'CreditCard', 'Edu_mark', 'Account_holder_category'],
      dtype='object')
In [48]:
In [49]:

From the above graph we could infer that , customers who hold deposit account & customers who do not hold either a securities account or deposit account have aviled personal loan

In [ ]:

Perform Hypothesis Testing

Q.. How Age of a person is going to be a factor in availing loan ??? Does Income of a person have an impact on availing loan ??? Does the family size makes them to avail loan ???¶

In [50]:
C:\Users\cmraj\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning:

Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.

Out[50]:
<AxesSubplot:xlabel='Age', ylabel='Personal Loan'>
In [51]:
In [52]:
In [53]:
In [54]:
Age does not have impact on availing personal loan  as the p_value is greater than 0.05 with a value of 0.584959263705325
In [ ]:

automate above stuffs

In [55]:
In [56]:
Age does not have impact on availing personal loan as the p_value is greater than 0.05 with a value of 0.584959263705325
In [ ]:

Q..Income of a person has significant impact on availing Personal Loan or not?

In [57]:
Income does  have impact on availing personal loan, as the p_value is less than 0.05 with a value of 0.0

Income have phenomenal significance on availing personal Loan , As the P_value is less than 0.05 with a value of :0.0

In [ ]:

Q..Number of persons in the family has significant impact on availing Personal Loan or not?

In [58]:
Family does  have impact on availing personal loan, as the p_value is less than 0.05 with a value of 1.4099040685673807e-05

Family have phenomenal significance on availing personal Loan , As the P_value is less than 0.05 with a value of :1.4099040685673807e-05

In [ ]: